load balancing
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- (7 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- (7 more...)
Load Balancing for AI Training Workloads
McClure, Sarah, Ratnasamy, Sylvia, Shenker, Scott
We investigate the performance of various load balancing algorithms for large-scale AI training workloads that are running on dedicated infrastructure. The performance of load balancing depends on both the congestion control and loss recovery algorithms, so our evaluation also sheds light on the appropriate choices for those designs as well.
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence (1.00)
AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study
- This paper addresses the critical need for efficient energy management in healthcare facilities, where fluctuating energy demands challenge both operational and sustainability goals. Traditional energy management methods often fall short in healthcare settings, lead ing to inefficiencies and increased costs. To address this, the paper explores AI - driven approaches for demand forecasting and load balancing, introducing a novel integration of LSTM (Long Short - Term Memory), g enetic a lgorithm, and SHAP (Shapley Additive E xplanations) specifically tailored for healthcare energy management. While LSTM has been widely used for time - series forecasting, its application in healthcare energy demand prediction is underexplored. Here, LSTM is demonstrated to significantly outperfor m ARIMA and Prophet models in handling complex, non - linear demand patterns. Results show that LSTM achieved a Mean Absolute Error (MAE) of 21.69 and Root Mean Square Error (RMSE) of 29.96, significantly improving upon Prophet (MAE: 59.78, RMSE: 81.22) and ARIMA (MAE: 87.73, RMSE: 125.22), highlighting its superior forecasting capability. Genetic algorithm is employed not only for optimising forecasting model parameters but also for dynamically improving load balancing strategies, ensuring adaptability to real - time energy fluctuations. Additionally, SHAP analysis is used to interpret the models and understan d the impact of various input features on predictions, enhancing model transparency and trustworthiness in energy decision - making. The combined LSTM - GA - SH AP approach offers a comprehensive framework that improves forecasting accuracy, enhances energy efficiency, and supports sustainability in healthcare environments. Future work could focus on real - time implementation and further hybridisation with reinforc ement learning for continuous optimisation. This study establishes a strong foundation for leveraging AI in healthcare energy management, showcasing its potential for scalability, efficiency, and resilience. Introduction Australia has a big capacity of using renewable energy in different regions ( Holloway, R, 2023; Rahimi et al., 2025) . Australian healthcare system plays a major role in using renewable energies. Optimising energy use in healthcare systems is essential due to the high and often unpredictable energy demands needed to run medical equipment, keep environmental conditions stable, and support constant patient care.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.52)
- Oceania > Australia > Western Australia > Perth (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (12 more...)
- Health & Medicine (1.00)
- Energy > Power Industry (1.00)
Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks
Singh, Kamal, Marouani, Sami, Sheikh, Ahmad Al, Quang, Pham Tran Anh, Habrard, Amaury
As load and delta load increase, the policy puts more flows on the Internet link. Increasing Internet delay puts the flows on MPLS. The contribution of Internet loss seems counter intuitive as it seems to put more load on Internet Link. However, even if its coefficient is near to 1.0, the overall contribution of the term is negligible as compared to load because loss in our scenario varies from 0 to around 0.15. This applies to delay too. For minimising loss, we extract the following: a 1. 9 1 .1( 2 λ 3 + 1) 2 2λ i 5 + 10 d i 3 + u i 10 (4) This policy can be interpreted as follows, and we may refer to Figure 1 as well. The ratio starts near 0.8 and increasing load, with increasing delta, puts more traffic on Internet link. Increasing Internet delay and Internet link utilisation slightly shifts the balance towards putting more traffic on MPLS link. Distillation of symbolic equations of PPO policy: In this method, we train policy using PPO, generate trajectory data and then generate the symbolic equations using auto-regressive models [22].
- Europe > France (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States (0.04)
- Energy (0.49)
- Telecommunications (0.48)
A transformer-based deep q learning approach for dynamic load balancing in software-defined networks
Owusu, Evans Tetteh, Agyekum, Kwame Agyemang-Prempeh, Benneh, Marinah, Ayorna, Pius, Agyemang, Justice Owusu, Colley, George Nii Martey, Gazde, James Dzisi
This study proposes a novel approach for dynamic load balancing in Software-Defined Networks (SDNs) using a Transformer-based Deep Q-Network (DQN). Traditional load balancing mechanisms, such as Round Robin (RR) and Weighted Round Robin (WRR), are static and often struggle to adapt to fluctuating traffic conditions, leading to inefficiencies in network performance. In contrast, SDNs offer centralized control and flexibility, providing an ideal platform for implementing machine learning-driven optimization strategies. The core of this research combines a Temporal Fusion Transformer (TFT) for accurate traffic prediction with a DQN model to perform real-time dynamic load balancing. The TFT model predicts future traffic loads, which the DQN uses as input, allowing it to make intelligent routing decisions that optimize throughput, minimize latency, and reduce packet loss. The proposed model was tested against RR and WRR in simulated environments with varying data rates, and the results demonstrate significant improvements in network performance. For the 500MB data rate, the DQN model achieved an average throughput of 0.275 compared to 0.202 and 0.205 for RR and WRR, respectively. Additionally, the DQN recorded lower average latency and packet loss. In the 1000MB simulation, the DQN model outperformed the traditional methods in throughput, latency, and packet loss, reinforcing its effectiveness in managing network loads dynamically. This research presents an important step towards enhancing network performance through the integration of machine learning models within SDNs, potentially paving the way for more adaptive, intelligent network management systems.
- Africa > Ghana > Ashanti > Kumasi (0.05)
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Asia > India > Uttar Pradesh (0.04)
- Telecommunications > Networks (1.00)
- Information Technology (1.00)
- Energy > Power Industry (1.00)
Safe Load Balancing in Software-Defined-Networking
Dinh, Lam, Quang, Pham Tran Anh, Leguay, Jérémie
High performance, reliability and safety are crucial properties of any Software-Defined-Networking (SDN) system. Although the use of Deep Reinforcement Learning (DRL) algorithms has been widely studied to improve performance, their practical applications are still limited as they fail to ensure safe operations in exploration and decision-making. To fill this gap, we explore the design of a Control Barrier Function (CBF) on top of Deep Reinforcement Learning (DRL) algorithms for load-balancing. We show that our DRL-CBF approach is capable of meeting safety requirements during training and testing while achieving near-optimal performance in testing. We provide results using two simulators: a flow-based simulator, which is used for proof-of-concept and benchmarking, and a packet-based simulator that implements real protocols and scheduling. Thanks to the flow-based simulator, we compared the performance against the optimal policy, solving a Non Linear Programming (NLP) problem with the SCIP solver. Furthermore, we showed that pre-trained models in the flow-based simulator, which is faster, can be transferred to the packet simulator, which is slower but more accurate, with some fine-tuning. Overall, the results suggest that near-optimal Quality-of-Service (QoS) performance in terms of end-to-end delay can be achieved while safety requirements related to link capacity constraints are guaranteed. In the packet-based simulator, we also show that our DRL-CBF algorithms outperform non-RL baseline algorithms. When the models are fine-tuned over a few episodes, we achieved smoother QoS and safety in training, and similar performance in testing compared to the case where models have been trained from scratch.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
- Telecommunications > Networks (0.94)
- Information Technology (0.67)
- Energy > Power Industry (0.64)
Reinforcement Learning-Based Adaptive Load Balancing for Dynamic Cloud Environments
Efficient load balancing is crucial in cloud computing environments to ensure optimal resource utilization, minimize response times, and prevent server overload. Traditional load balancing algorithms, such as round-robin or least connections, are often static and unable to adapt to the dynamic and fluctuating nature of cloud workloads. In this paper, we propose a novel adaptive load balancing framework using Reinforcement Learning (RL) to address these challenges. The RL-based approach continuously learns and improves the distribution of tasks by observing real-time system performance and making decisions based on traffic patterns and resource availability. Our framework is designed to dynamically reallocate tasks to minimize latency and ensure balanced resource usage across servers. Experimental results show that the proposed RL-based load balancer outperforms traditional algorithms in terms of response time, resource utilization, and adaptability to changing workloads. These findings highlight the potential of AI-driven solutions for enhancing the efficiency and scalability of cloud infrastructures.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks
Roth, Manuel M. H., Hegde, Anupama, Delamotte, Thomas, Knopp, Andreas
Effective routing in satellite mega-constellations has become crucial to facilitate the handling of increasing traffic loads, more complex network architectures, as well as the integration into 6G networks. To enhance adaptability as well as robustness to unpredictable traffic demands, and to solve dynamic routing environments efficiently, machine learning-based solutions are being considered. For network control problems, such as optimizing packet forwarding decisions according to Quality of Service requirements and maintaining network stability, deep reinforcement learning techniques have demonstrated promising results. For this reason, we investigate the viability of multi-agent deep Q-networks for routing in satellite constellation networks. We focus specifically on reward shaping and quantifying training convergence for joint optimization of latency and load balancing in static and dynamic scenarios. To address identified drawbacks, we propose a novel hybrid solution based on centralized learning and decentralized control.
Load Balancing in Federated Learning
Javani, Alireza, Wang, Zhiying
Federated Learning (FL) is a decentralized machine learning framework that enables learning from data distributed across multiple remote devices, enhancing communication efficiency and data privacy. Due to limited communication resources, a scheduling policy is often applied to select a subset of devices for participation in each FL round. The scheduling process confronts significant challenges due to the need for fair workload distribution, efficient resource utilization, scalability in environments with numerous edge devices, and statistically heterogeneous data across devices. This paper proposes a load metric for scheduling policies based on the Age of Information and addresses the above challenges by minimizing the load metric variance across the clients. Furthermore, a decentralized Markov scheduling policy is presented, that ensures a balanced workload distribution while eliminating the management overhead irrespective of the network size due to independent client decision-making. We establish the optimal parameters of the Markov chain model and validate our approach through simulations. The results demonstrate that reducing the load metric variance not only promotes fairness and improves operational efficiency, but also enhances the convergence rate of the learning models.
- Information Technology > Security & Privacy (0.68)
- Energy > Power Industry (0.41)